Sixth International Joint Conference on Natural Language Processing Proceedings of the 11th Workshop on Asian Language Resources

نویسندگان

  • Laxmi Kashyap
  • Malhar Kulkarni
چکیده

Bilingual corpora play an important role as resources not only for machine translation research and development but also for studying tasks in comparative linguistics. Manual annotation of word alignments is of significance to provide a gold-standard for developing and evaluating machine translation models and comparative linguistics tasks. This paper presents research on building an English-Vietnamese parallel corpus, which is constructed for building a Vietnamese-English machine translation system. We describe the specification of collecting data for the corpus, linguistic tagging, bilingual annotation, and the tools specially developed for the manual annotation. An English-Vietnamese bilingual corpus of over 800,000 sentence pairs and 10,000,000 English words as well as Vietnamese words has been collected and aligned at the sentence level, and over 45,000 sentence pairs of this corpus have been aligned at the word level. Moreover, the 45,000 sentence pairs have been tagged using other linguistics tags, including word segmentation for Vietnamese text, chunker and named entity tags.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sixth International Joint Conference on Natural Language Processing Proceedings of the Fourth Workshop on South and Southeast Asian Natural Language Processing

This paper deals with the fast bootstrapping of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-resourced language (Iban spoken in Sarawak and in several parts of the Borneo Island) for which no res...

متن کامل

Sixth International Joint Conference on Natural Language Processing The First Workshop on Natural Language Processing for Medical and Healthcare Fields

This paper describes a method to extract medical information from texts. The method targets to extract complaints and diagnoses from electronic health record texts. Complaints and diagnoses are fundamental information and can be used for more complex medical tasks. The method utilizes several medical knowledge resources to enhance the performance of extraction. With an evaluation using NTCIR10 ...

متن کامل

Sixth International Joint Conference on Natural Language Processing Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing

In this talk, we are going to give a systematic view of lexical semantics of Chinese language. From macro perspective point of view, lexical conceptual meanings are classified into hierarchical semantic types and each type plays some particular semantic functions of Host, Attribute, and Value to form a semantic compositional system. Lexical senses and their compositional functions will be exemp...

متن کامل

Sixth International Joint Conference on Natural Language Processing Proceedings of the Workshop on Natural Language Processing for Social Media (SocialNLP)

In Taiwan, there are different types of TV programs, and each program usually has its broadcast length and frequency. We accumulate the broadcasted TV programs’ word-ofmouth on Facebook and apply the Backpropagation Network to predict the latest program audience rating. TV audience rating is an important indicator regarding the popularity of programs and it is also a factor to influence the rev...

متن کامل

Role of the Language School’s Principals in Academic Achievements

This study investigated the role of the principal in managing teaching and learning. It examined how and to whom principals distributed the management of teaching and learning.  It was found that participants thought principals could improve school effectiveness most by engaging in activities that develop a good climate and ensure appropriate resources are available for instruction. There is a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013